A Model for Learning Words in a Language by Crawling the Web

نویسندگان

  • Jeffrey J. Thomson
  • Rex E. Gantenbein
چکیده

A model for an Internet web crawler with a very limited vocabulary can be devised to learn most words in the English language. The system will have the ability to read a sentence where only constituents of the sentence are known. In order to achieve this, the system will provide a methodology to resolve ambiguities within the unknown constituent words and parts of speech. The system will include a lexicon of word types, nouns, verbs, adjectives, and adverbs to be learned, seeded with 100 random words. The source material being read need not be domain specific. As the overall lexicon of the system improves and grows, domain-specific and technically advanced jargon can be more readily handled. Some categories of word types such as determiners (the, a, each, all, etc.), conjunctions (and, or, but) and prepositions (by, along, etc.) can easily be exhaustively enumerated. An algorithm will be used in the process of determining the nature of unknown words where a lexicon of English, nouns, verbs and adjectives is large.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of language complexity and group size on knowledge construction: Implications for online learning

This  study  investigated  the  effect  of  language  complexity  and  group  size  on  knowledge construction in two online debates. Knowledge construction was assessed using Gunawardena et al.’s Interaction Analysis Model (1997). Language complexity was determined by dividing the  number  of  unique  words  by  total  words.  It  refers  to  the  lexical  variation.  The  results showed  that...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...

متن کامل

Impact of Using Web-quests on Learning Vocabulary by Iranian Pre-university Students

Web-quests are internet-based technology applications in which groups of students follow a specific set of steps toward the completion of a final project on a specific subject or a multi-disciplinary subject. The present study aimed to investigate the impacts of using web-quests on learning vocabulary by Iranian pre-university students. The sample of the study consisted of 72 students assigned ...

متن کامل

English Teachers Professional Development Needs for Web Development Skills: Meeting the Challenges of Teaching English Language in the Information Age

Utilizing the resources of the web in educational practices has made instructional processes more efficient and interesting and has made the learning process on the other hand much easier and attractive. With the web, English language teachers now have the option of engaging learners in online (web-based) instructions in addition to the use of conventional classroom instructions or alternativel...

متن کامل

Impact of Using Web-quests on Learning Vocabulary by Iranian Pre-university Students

Web-quests are internet-based technology applications in which groups of students follow a specific set of steps toward the completion of a final project on a specific subject or a multi-disciplinary subject. The present study aimed to investigate the impacts of using web-quests on learning vocabulary by Iranian pre-university students. The sample of the study consisted of 72 students assigned ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009